Maximum margin semi-supervised learning with irrelevant data

نویسندگان

  • Haiqin Yang
  • Kaizhu Huang
  • Irwin King
  • Michael R. Lyu
چکیده

Semi-supervised learning (SSL) is a typical learning paradigms training a model from both labeled and unlabeled data. The traditional SSL models usually assume unlabeled data are relevant to the labeled data, i.e., following the same distributions of the targeted labeled data. In this paper, we address a different, yet formidable scenario in semi-supervised classification, where the unlabeled data may contain irrelevant data to the labeled data. To tackle this problem, we develop a maximum margin model, named tri-class support vector machine (3C-SVM), to utilize the available training data, while seeking a hyperplane for separating the targeted data well. Our 3C-SVM exhibits several characteristics and advantages. First, it does not need any prior knowledge and explicit assumption on the data relatedness. On the contrary, it can relieve the effect of irrelevant unlabeled data based on the logistic principle and maximum entropy principle. That is, 3C-SVM approaches an ideal classifier. This classifier relies heavily on labeled data and is confident on the relevant data lying far away from the decision hyperplane, while maximally ignoring the irrelevant data, which are hardly distinguished. Second, theoretical analysis is provided to prove that in what condition, the irrelevant data can help to seek the hyperplane. Third, 3C-SVM is a generalized model that unifies several popular maximum margin models, including standard SVMs, Semi-supervised SVMs (S(3)VMs), and SVMs learned from the universum (U-SVMs) as its special cases. More importantly, we deploy a concave-convex produce to solve the proposed 3C-SVM, transforming the original mixed integer programming, to a semi-definite programming relaxation, and finally to a sequence of quadratic programming subproblems, which yields the same worst case time complexity as that of S(3)VMs. Finally, we demonstrate the effectiveness and efficiency of our proposed 3C-SVM through systematical experimental comparisons.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Deep Transductive Semi-supervised Maximum Margin Clustering

Semi-supervised clustering is an very important topic in machine learning and computer vision. The key challenge of this problem is how to learn a metric, such that the instances sharing the same label are more likely close to each other on the embedded space. However, little attention has been paid to learn better representations when the data lie on non-linear manifold. Fortunately, deep lear...

متن کامل

Maximum Margin Semi-Supervised Learning for Structured Variables

Many real-world classification problems involve the prediction of multiple inter-dependent variables forming some structural dependency. Recent progress in machine learning has mainly focused on supervised classification of such structured variables. In this paper, we investigate structured classification in a semi-supervised setting. We present a discriminative approach that utilizes the intri...

متن کامل

Nonparametric Maximum Margin Similarity for Semi-Supervised Learning

1. Nonparametric Label Propagation (LP) has been proven to be effective for semi-supervised learning problems, and it predicts the labels for unlabeled data by a harmonic solution of an energy minimization problem which encourages local smoothness of the labels in accordance with the similarity graph. 2. On the other hand, the success of LP algorithms highly depends on the underlying similarity...

متن کامل

Minimum Density Hyperplanes

Associating distinct groups of objects (clusters) with contiguous regions of high probability density (high-density clusters), is a central assumption in statistical and machine learning approaches for the classification of unlabelled data. In unsupervised classification this cluster definition underlies a nonparametric approach known as density clustering. In semi-supervised classification, cl...

متن کامل

Synthesis of maximum margin and multiview learning using unlabeled data

In this presentation we show the semi-supervised learning with two input sources can be transformed into a maximum margin problem to be similar to a binary SVM. Our formulation exploits the unlabeled data to reduce the complexity of the class of the learning functions. In order to measure how the complexity is decreased we use the Rademacher Complexity Theory. The corresponding optimization pro...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • Neural networks : the official journal of the International Neural Network Society

دوره 70  شماره 

صفحات  -

تاریخ انتشار 2015